CP-NAS: Child-Parent Neural Architecture Search for 1-bit CNNs
101
Parent
Child
Performance evaluation
Search space
Reduce
Parent weights
Child weights
MSE loss
CP NAS
CP Optimization
CP Model
C
A
P
A
Parent
Child
Binarized
FIGURE 4.5
The main framework of the Child-Parent model. The Child-Parent model focuses on bina-
rized architecture search (left) and binarized optimization (right).
Thus, we can define it for each operation of the sampled network as
z(i,j)
k,t
=βP (AP,t −AC,t) + AC,t
(4.15)
where AP,t and AC,t represents the network performance calculated by the accuracy of the
full-precision model (Parent) and the binarized model (Child) on the validation dataset, and
βP is the hyperparameter to control performance loss. i,j represents the index of the node
to generate the edge (i, j) shown in Fig. 4.6, k is the operation index of the corresponding
edge and t represents the tth sampling process. Note that we used the performance of the
sampled network to evaluate the performance of the corresponding selected operations.
CP-NAS [304] not only uses the accuracy on the validation dataset to guide the search
process directly but also considers the information of the full-precision model to investigate
better the full potential of the binarized model that can ultimately be reached. Additional
details are provided in the following section.
As shown in Fig. 4.5, unlike the traditional teacher-student model [87], which transfers
the generalization ability of the first model to a smaller model by using the class proba-
bilities as “soft targets,” the child-parent model focuses on the performance measure that
is particularly suitable for NAS-based network binarization. Furthermore, the loss function
for the teacher-student model is constrained to the feature map or the output, while ours
focuses on the kernel weights.
B-1
-1
N
0
N
2
N
3
N
4
N
Output
1
N
FIGURE 4.6
The cell architecture for CP-NAS. A cell includes 2 input nodes, 4 intermediate nodes, and
14 edges.